﻿<!DOCTYPE html>
<html>
<head>
	<title>A vs An - Determine english indeterminate article</title>
	<link rel="shortcut icon" href="emnicon.ico" />
	<meta content="text/html; charset=UTF-8" http-equiv="content-type" />
	<style type="text/css">
		#article {
			min-width: 3em;
		}

		body {
			font: 11pt Verdana, sans-serif;
		}
	</style>
</head>
<body>
	<p><i>It's <b>an</b> unanticipated result, but it's <b>a</b> unanimous result...</i></p>
	<p>It is <span id="article">a</span>
		<input id="searchquery" type="text" name="q" placeholder="enter word here" autofocus="autofocus" />
		<span id="articleDetail">&nbsp;</span></p>
	<h3>Details: </h3>
	<p>This page determines whether "a" or "an" should precede a word.  It does this using the method described in <a href="http://stackoverflow.com/questions/1288291/how-can-i-correctly-prefix-a-word-with-a-and-an/1288473#1288473">this stackoverflow response</a>.  The dataset used is the wikipedia-article-text dump.  Some additional preprocessing was done to remove as much wiki-markup as possible and extract only things vaguely resembling sentences using regular expressions.  If the word following 'a' or 'an' started with a quote or parenthesis, the initial quote or parenthesis was ignored.  The resulting prefix-list with the code to query it is less than 10KB in size; excluding the actual counts would reduce the size still further.
	</p>
	<p>Try...</p>
	<ul>
		<li>unanticipated result</li>
		<li>unanimous vote</li>
		<li>honest decision</li>
		<li>honeysuckle shrub</li>
		<li>0800 number</li>
		<li>∞ of oregano</li>
		<li>NASA scientist</li>
		<li>NSA analyst</li>
		<li>FIAT car</li>
		<li>FAA policy</li>
	</ul>
	<p>You may use, modify, redistribute and do whatever you want with the data+script used on this page, but please don't misrepresent its source (license: <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache 2.0</a>).  If you've any questions, you can mail me at &lt;firstname&gt;@&lt;lastname&gt;.org.  
	</p>
	<p>Downloads:</p>
	<ul>
		<li><b>.NET</b> <a href="https://www.nuget.org/packages/AvsAn/">nuget package AvsAn</a> (.NET 2.0 or later); <a href="http://code.google.com/p/a-vs-an/">C# is source on code.google.com/p/a-vs-an</a>; as are <a href="http://code.google.com/p/a-vs-an/downloads/list">the binaries</a>.</li>
		<li><b>JS:</b> Variant including counts of a's and an's : <a href="AvsAn.js">AvsAn.js</a> (10111 bytes, minified+gzipped it's 5434 bytes)</li>
		<li><b>JS:</b> Variant including only which article is more common: <a href="AvsAn-simple.js">AvsAn-simple.js</a> (4823 bytes, minified+gzipped it's 2553 bytes)</li>
		<li><b>node.js package:</b> (alternative implementation by Chad Kirby) <a href="https://github.com/uplake/Articles">github: uplake / Articles</a></li>
	</ul>
	The implementations are efficient: on a single thread of a 2.5GHz Q9300 a benchmark classifying all words of an english dictionary achieves about 15 million words a second; that's just 166 clock cycles per word. The javascript implementations were benchmarked on chrome 26, firefox 22, IE 10, and opera 12, and are about 5-10 times slower, at approximately 2 million classifications per second. 
	<p>--Eamon Nerbonne
	</p>
</body>
<script src="AvsAn.js"></script>
<script>
	(function () {
		"use strict";
		var articleEl = document.getElementById("article");
		var articleDetailEl = document.getElementById("articleDetail");
		var inputEl = document.getElementById("searchquery");

		inputEl.onkeyup = inputEl.oninput = function () {
			var input = document.getElementById("searchquery").value.replace(/^\s+|\s+$/g, "");
			input = input.replace(/^[\(\"'“‘-]/, ""); //strip initial punctuation symbols
			var res = AvsAn.query(input);
			articleEl.replaceChild(document.createTextNode(res.article), articleEl.firstChild);
			articleDetailEl.replaceChild(document.createTextNode("prefix: \""+res.prefix +"\" ("+ res.aCount + " vs. " + res.anCount + ")"), articleDetailEl.firstChild);
		}
	})();
</script>

</html>
